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EXECUTIVE  SUMMARY 


The  Android™  Operating  System  (OS)  is  far  and  away  the  most  popular  OS  for  mobile  devices.  However, 
the  ease  of  entry  into  the  Android™  Marketplace  allows  for  easy  distribution  of  malware.  This  report  surveys 
malware  distribution  methodologies,  then  describes  current  work  being  done  to  determine  the  risk  that  an 
“app”  (application)  is  infected.  App  analysis  methods  discussed  include  code  and  behavioral  analysis 
methods. 
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1.  INTRODUCTION 


The  Android™  Operating  System  (OS)  is  the  most  popular  platform  for  mobile  devices,  currently  holding 
at  least  80%  of  marketshare  [1],  which  continues  to  grow  and  is  expected  to  reach  more  than  1.6  billion  users 
by  2020  [2] .  The  platform  offers  an  open  entry  point  into  the  “app”  (application)  market  for  developers, 
which  also  gives  malicious  actors  access  to  millions  (potentially  billions)  of  devices  [3]. 

This  report  gives  an  exposition  on  risk  metrics  for  applications  in  the  Android™  OS  ecosystem  and  is 
organized  as  follows.  First,  common  malware  distribution  methods  arc  explained  and  suggestions  for 
mitigation  are  made.  Then  both  static  code  analysis  and  behavior-based  analysis  methods  for  risk  assessment 
are  surveyed.  Several  other  approaches  to  risk  analysis,  such  as  crowd  sourcing  risk  analysis,  arc  surveyed. 
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2.  MALWARE  AND  ITS  DISTRIBUTION 


Malware  may  include  trojans,  backdoors,  worms,  botnets,  spyware,  adware,  and  ransomeware: 

•  Trojans  appear  to  be  benign  apps,  but  perform  harmful  activities  without  user  knowledge  or  consent. 
A  recent  trend  in  trojans  has  seen  attacks  on  multi-factor  authentication  for  banking  transactions, 
capturing  usernames  and  passwords  and  then  stealing  Mobile  Transaction  Authentication  Numbers 
(mTANs)  to  silently  complete  transactions  [4]. 

•  Backdoors  allow  outsiders  to  stealthily  enter  the  system,  bypassing  normal  access  features. 

•  Worms  spread  exact  copies  of  themselves  and  spread  through  the  network,  for  instance  Bluetooth 
worms  will  copy  themselves  on  paired  devices  [5]. 

•  Botnet  apps  allow  for  a  device,  called  a  bot  to  be  controlled  by  an  outside  server,  and  collections  of 
bots  to  act  in  concert  as  botnets.  Botnets  are  often  used  in  Distributed  Denial  of  Service  (DDoS) 
attacks.  Botnet  apps  may  include  instructions  to  download  other  malicious  packages. 

•  Spyware  often  presents  itself  as  a  good  utility,  but  monitors  user  activities  in  the  background  (e.g., 
contacts,  messages,  locations,  etc.). 

•  Adware  is  used  to  target  a  device  user  for  personalized  advertisements,  which  may  degrade 
operations. 

•  Ransomeware  [6]  locks  a  user’s  device,  making  it  inaccessible  until  a  ransome  is  paid  through  an 
online  payment  service  (oftentimes  using  a  cryptocurrency  like  Bitcoin,  which  makes  payments 
untraceable  [7]). 

2.1  IS  A  POPULAR  APP  FREE?  THE  THREAT  POSED  BY  REPACKAGING 

Oftentimes,  multiple  versions  of  an  app  will  appeal-  in  the  app  store  when  a  potential  user  searches  for 
them.  One  version  of  an  app  will  have  a  monetary  cost  associated  with  it  while  other  versions  will  be  listed 
as  free  to  the  user.  The  version  of  the  app  with  a  monetary  cost  associated  with  it  will  generally  be  benign 
while  the  free  version(s)  will  be  repackaged  to  include  adware  [8]  (not  explicitly  malicious)  or  malware  [9] 
(explicitly  malicious)  inserted  in  the  underlying  code.  There  are  reports  that  77%  of  the  top  50  free  apps  in 
the  Google  Play  Store  are  repackaged  [10].  Repackaging  is  among  the  most  common  methods  for  Android™ 
mal  ware  distribution  [1 1,  12];  repackaging  and  risk  can  be  minimized  by  ensuring  that  only  the  official 
paid-for  app  is  downloaded. 

2.2  OTHER  DISTRIBUTION  TECHNIQUES:  DRIVE-BY  DOWNLOADS,  DYNAMIC  PAY- 
LOADS,  AND  STEALTH  TECHNIQUES 

In  addition  to  repackaging  of  apps,  malware  may  be  distributed  by  other  means. 

2.2.1  Drive-by  Downloads 

A  drive-by  download  (DbD)  occurs  when  malware  is  unintentionally  downloaded  in  the  background  [10]. 
The  DbD  attack  has  been  a  common  tactic  for  malware  distribution  and  propogation  for  many  years  [11]. 
Traditionally,  the  anatomy  of  a  DbD  attack  would  include  an  attacker  compromising  a  legitimate  Web 
application  and  uploading  a  malicious  Javascript™.  The  ultimate  victim  of  the  DbD  will  become  infected  by 
sending  a  Web  request  to  the  server,  which  sends  the  malicious  code  along  with  the  legitimately  requested 
Web  page  and  the  Javascript™  downloads  its  malicious  payloads  [13].  Many  of  these  attacks  target 
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third-party  browser  plug-ins,  which  introduce  “low-hanging  fruit”  vulnerabilities  [13,  11],  because  they  may 
undergo  less  testing.  Many  plug-ins  arc  often  written  in  languages  like  C,  without  concern  for  security  best 
practices,  and  arc  distributed  as  executable  binaries  [13]. 

DbD  attacks  for  mobile  devices  do  not  typically  target  browser  vulnerabilities,  but  may  entice  users  to 
download  feature -rich  apps.  Zhou  and  Jiang  [11]  found  four  families  of  Android™  DbD  malware: 

•  GGTracker  uses  in-app  advertisements  to  attract  users  to  malicious  links.  Users  are  usually 
subscribed  to  a  premium  rate  service  without  their  knowledge  [14]. 

•  Jifake  is  downloaded  by  malicious  Quick  Response  (QR)  codes  that  redirect  the  user  to  a  URL 
containing  it.  The  malware  sends  several  SMS  messages  to  a  premium-rate  number,  and  is  the  first 
known  case  of  a  malicious  QR  code -based  attack  [15]. 

•  SpyEye-in-the-Mobile  (Spitmo)  targets  online  banking  mobile  transaction  authentication  numbers 
(mTANs)  by  directing  users  to  download  an  app  that  promises  safer  online  banking.  Once 
downloaded,  user  mTANS  and  Short  Message  Service  (SMS)  messages  are  sent  to  a  remote  server 
[16]. 

•  ZeuS-in-the-Mobile  (Zitrno)  is  a  variant  of  the  ZeuS  botnet  and,  like  Spitmo,  targets  online  banking 
activities,  primarily  in  Europe.  It  steals  mTANs  and  sends  them  to  remote  servers  [17]. 

2.2.2  Stealth  Malware  Techniques 

Malware  developers  are  taking  advantage  of  the  limited  resources  available  on  Android™  devices  that 
limit  scanning  capabilities.  Hardware  vulnerabilities  and  obfuscation  of  malicious  code  allow  easy  avoidance 
of  anti-malware  apps  [18].  Some  of  these  techniques  include  key  permutation,  native  code  execution,  code 
encryption,  and  Java™  reflection  [10]. 

2.2.3  Update  Attacks 

An  Update  Attack  is  a  malware  distribution  technique  that  is  very  resistant  to  static  scanning  techniques. 
Instead  of  downloading  the  entire  malicious  package,  there  is  only  an  update  component  that  fetches  the 
malicious  payload  at  runtime  [11],  Some  examples  of  Update  Attacks  include: 

•  BaseBridge  [19]  includes  one  exploit,  “rage  against  the  cage”,  which  downloads  a  payload 
application  which  then  attempts  a  privilege  escalation  attack  to  gain  root  access.  It  transmits 
personally  identifiable  information  to  a  remote  server  [20]. 

•  DroidKungFuUpdate  downloads  malware  through  a  third-party  library  that  provides  a  legitimate 
notification  functionality  [11]. 

•  AnserverBot  [21]  upgrades  certain  components  within  their  host  app.  Since  the  entire  app  is  not 
updated,  user  permission  is  not  required.  (This  is  a  distinct  difference  from  BaseBridge  and 
DroidKungFuUpdate.)  AnserverBot  retrieves  a  public  (but  encrypted)  blog  entry  containing  the 
payloads  for  the  update. 

•  Plankton  [19]  operates  similar  to  AnserverBot,  except  it  downloads  additional  dex  files  to 
dynamically  extend  its  capabilities. 
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3.  CODE  ANALYSIS  METHODS 


An  effective  method  for  assessing  risk  of  malware  infection  in  Android™  apps  is  the  comparison  of 
“feature  vector”  analysis.  Feature  vectors  summarize  important  characteristics  of  an  object’s  state  [22]  and 
feature  vector  elements  for  a  given  app  arc  characteristic  actions  taken  by  that  app.  Apps  arc  classified  into 
different  categories  based  on  their  claims.  Benign  apps  within  each  category  that  have  similar  functions  arc 
expected  to  have  similar  permissions  requests,  while  malicious  apps  deviate.  The  extent  of  this  deviation  can 
be  useful  for  assessing  the  risk  of  malware  infection  in  an  app.  The  use  of  weighting  mechanisms  [23]  are 
useful  for  feature  vector  analysis. 

Almost  all  risk  assessment  for  Android™  apps  arc  permissions-based.  Reference  [24]  shows  a  feature 
vector  analysis  approach  to  to  risk  metrics  by  considering  the  vector  of  permissions  requests.  Apps  arc 
separated  into  29  categories  and  then  the  permissions  tendencies  of  benign  apps  arc  used  to  create  baseline 
permissions  vectors  for  each  category.  There  arc  144  permissions  available  in  the  Android™  operating 
system;  however,  53  of  these  were  not  available  to  third-party  applications,  and  so  the  remaining  91 
permissions  were  used  for  a  permissions  vector.  A  91-dimension  Boolean  vector  was  constructed  with  a  1 
denoting  when  a  permission  was  requested  and  a  0  when  it  was  not.  Every  app  is  represented  in  this  manner 
and  compared  against  a  category  baseline  permissions  vector  using  a  weighted  Euclidean  metric.  In  the 
Cartesian  Coordinate  System,  p  =  (p\  ,pi. ...,  pn )  and  q  =  {q\.  (pi, ...,  qn)  arc  points  in  a  Euclidean  n-spacc 
and  the  distance  between  them  is  defined  by 

n 

(p,  q)  =  d( p,  q)  =  ~  Pi)2-  (1) 

\  t=0 

When  certain  coordinates  are  deemed  of  greater  or  lesser  importance,  a  weighted  Euclidean  metric  is  used: 

n 

d( P,  q)  =  d{p,  q)  =  ,  “  Pi)2-  (2) 

\  i= 0 

Specifically,  24  permissions  that  were  seen  as  risky  were  given  a  weight  of  m,  =  2,  while  the  others  were 
simply  given  a  standard  weight  of  wl  =  1. 

Two  data  sets  were  used  for  testing  this  methodology.  Because  the  authors  are  Chinese,  they  chose  to 
download  apps  from  the  Tencent  Android™  Market,  a  leading  Android™  market  in  that  country.  This  data 
set  consisted  of  7099  verified  benign  apps  in  various  categories.  The  second  data  set  consisted  of  1260 
malware  samples  downloaded  form  the  Android™  Malware  Gene  Project  [25],  covering  the  majority  of 
Android™  malware  families.  The  market  data  set  of  7099  verified  benign  apps  were  mainly  used  to  create 
baseline  vectors  for  each  app  category. 

Sanz  et  al.  [26]  use  sets  of  feature  vectors  of  permissions  and  uses-features  for  anomaly  detection.  This 
information  is  attained  from  the  AndroidManifest .  xml  file  that  is  packed  with  every  Android™  app. 
Uses-features,  while  optional  information,  determine  the  feature  used  by  the  app. 

A  130-dimension  permissions  vector  and  a  37-dimension  uses-feature  vector  (where  possible)  were 
generated  from  this  information.  “Normality”  (baseline)  models  were  developed  using  the  information  in  the 
AndroidManifest  .xml  files  from  1,811  verified  benign  samples,  while  2,808  malware-infected 
samples  were  tested  for  anomalies.  Multiple  metrics  were  used  for  anomaly  detection  and  risk  analysis, 
including  the  Euclidean  metric  (1),  the  Manhattan  metric,  and  cosine  similarity.  The  Manhattan  metric  is  the 
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distance  between  two  points  p  and  q,  summing  the  lengths  of  the  projections  of  the  line  segments  between 
the  points  onto  the  coordinate  axes: 

n 

d{ P,  q)  =  d{ q,  p)  =  ^2  I  Pi  -  q*l-  (3) 

4=0 

Cosine  Similarity  measures  the  similarity  between  vectors  by  finding  the  cosine  of  the  angle  between  them. 
Measuring  distance  and  not  similarity  (and  hence  a  metric)  requires  the  use  of  1  —  C osineSimilarity: 

d{ p,  q)  =  d( q,  p)  =  1  -  cos(6>)  =  1  -  ^  ■  (4) 

HpII  •  INI 

Grace  et  al.  [27]  detail  a  prototype  system,  RiskRanker ,  which  utilizes  multi-order  code  analysis  to 
determine  the  risk  of  malware  in  apps.  RiskRanker  is  meant  to  be  a  proactive  scheme  that  sifts  through 
Android™  markets  spotting  zero-day  malware  without  relying  on  malware  specimines  or  their  signatures. 

Apps  arc  analysed  and  separated  by  potential  risk  levels: 

•  High  Risk  apps  exhibit  platform-level  software  vulnerabilities  that  could  be  used  to  compromise 
device  integrity  without  proper  authorization. 

•  Medium  Risk  apps  can  cause  financial  loss  or  disclose  private  information  about  the  user,  but  do  not 
exploit  software  vulnerabilities. 

•  Low  Risk  apps  may  collect  general  personal  or  device-specific  information. 

RiskRanker  subjects  apps  to  two  sets  of  code  analysis.  The  First  Order  Analysis  arc  designed  to  expose 
High  and  Medium  Risk  apps.  High  Risk  Apps  arc  detected  by  distilling  known  vulnerabilities  into 
vulnerability  signatures  that  capture  their  essential  characteristics,  which  are  exploited  when  the  vulnerability 
is  exploited.  Apps  are  pre-processed  to  detect  the  presence  of  native  code,  and  these  apps  are  checked  for  the 
presence  of  root  exploits.  Medium  Risk  Apps  arc  detected  by  making  use  of  Android’s  well-defined 
conditions  for  callback  invocation.  This  assumes  that  when  a  malware  intends  to  charge  the  device  user  or 
transmit  sensitive  data  without  their  knowledge,  it  is  unlikely  to  ask  permission  via  a  callback  handler. 
RiskRanker  performs  static  analysis  on  Dalvik  [28]  bytecode  contained  in  each  app,  using  control-  and 
data-flow  analysis  to  unambiguously  identify  the  callbacks  that  call  to  a  method  of  interest. 

Dalvik  is  a  virtual  machine  that  runs  applications  and  code  written  in  Java™.  Dalvik  requires  that  control- 
flow  graphs  be  reliably  determined  in  advance.  Once  code  is  loaded  for  execution,  Dalvik  employs  a  static 
verifier  to  ensure  that  methods  and  classes  arc  well-formed  and  can  be  resolved.  However,  this  verification  is 
only  for  intra-method  data-flow  analysis,  determining  the  type  of  virtual  register  at  each  point  in  the  program 
[27].  Moreover,  Android™  apps  do  not  require  being  called  in  any  strict  sequence  and  this  leads  to  apps 
having  a  very  complicated  lifecycle.  The  control-flow  graph  is  traversed  in  reverse,  searching  for  callback 
methods  that  do  not  imply  user  interaction.  This  approach  of  backwards  slicing  to  determine  where  the 
arguments  of  a  network  call  originate  leads  to  spotting  execution  paths  for  potential  information  leaks. 

The  First  Order  Analysis  excels  at  handling  non-obfuscated  apps,  but  may  not  be  able  to  detect  malware 
that  employs  encryption  or  dynamically  changes  its  payload.  RiskRanker’ s  Second  Order  Analysis  collects 
and  correlates  behaviors  that  arc  common  among  malware.  The  first  step  in  the  Second  Order  Analysis  is 
capturing  distinctive  behavior  that  is  not  likely  malicious  in  and  of  itself,  but  is  abused  by  existing  malware. 
An  example  of  this  type  of  behavior  is  the  inclusion  of  a  child  app  [29],  within  the  host  app,  that  can 
escalate  privileges  and  even  remain  after  the  host  has  been  uninstalled.  Another  example  is  the  use  of 
Java’s  encryption  APIs  (application  program  interfaces)  [30]  for  encrypting  their  data  and  communication. 
These  examples  arc  commonly  used  by  legitimate  apps  and  can  be  abused  by  malware.  Specifically,  to 
recognize  these  behaviors,  the  following  information  is  automatically  collected: 
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1.  Child  app  location, 

2.  Background  dynamic  code  loading  and  execution  paths, 

3.  Programmed  access  to  assets/res  directories, 

4.  Use  of  encryption  and  decryption  methods,  and 

5.  Native  code  execution  and  Java  Native  Interface  (JNI)  access. 

Malware  oftentimes  stores  exploit  code  in  an  encrypted  format  within  the  assets  or  res  directories. 
This  code  will  be  decrypted  and  read  at  runtime.  This  behavior  helps  evade  the  code  analysis  in  the  First 
Order  Analysis.  The  assets  and  res  directories  arc  designed  to  contain  art  assets,  user  interface 
descriptions,  and  so  on,  but  may  also  contain  arbitrary  data.  Accessing  these  directories  in  a  code  path  that 
includes  encryption  and  execution  methods  should  be  considered  suspect,  and  it  goes  without  saying  that 
storing  native  binaries  in  such  unusal  circumstances  suggests  that  an  app  is  hiding  something.  Java™ 
provides  native  support  for  cryptography  with  the  javax  .  crypto  package.  With  easy  to  use,  standardized 
encryption  and  hashing  functions,  this  is  an  alluring  package  for  malware  developers  (although  in  the  future 
they  may  use  difference  cryptography  packages),  and  so  RiskRanker  searches  specifically  for  the  use  of  its 
APIs.  Encrypted  native  binaries  will  be  run  once  they  have  been  decrypted  and  loaded,  so  JNI  calls  and 
other  invocations  of  native  code  arc  also  noted. 

Android™  provides  a  functionality  that  can  be  used  to  load  and  execute  bytecode  from  an  arbitrary  source 
at  runtime.  More  specifically,  any  app  can  make  use  of  the  DexClassLoader  to  load  classes  from 
embedded  .  jar  and  .  apk  files.  This  kind  of  unsafe  dynamic  loading  of  untrusted  Dalvik  code  is  noted  in 
the  Second  Order  Analysis  phase  of  RiskRanker.  None  of  the  Second  Order  Analysis  behaviors  in  and  of 
themselves  imply  that  an  app  is  malware,  and  many  legitimate  apps  make  use  of  these  functionalities.  For 
example,  dynamic  loading  is  used  by  many  legitimate  apps  for  updating  functionality  without  reinstalling  the 
app  itself. 

RiskRanker  was  tested  with  188,318  apps  collected  over  several  months,  and  successfully  detected  718 
malicious  apps  from  29  malware  families,  including  322  zero-days  from  1 1  malware  families.  The  authors 
point  out  that  RiskRanker  is  not  a  panacea  for  several  reasons: 

•  Some  malware  families  do  not  engage  in  behavior  that  the  system  was  designed  to  detect.  For 
instance,  malware  that  utilizes  social  engineering  attacks  arc  difficult  for  an  automated  system  to 
detect. 

•  Malware  may  not  share  the  same  payload  as  other  members  of  its  family. 

•  Some  apps  arc  “guilty  by  association,”  such  as  an  installer  that  is  not  itself  malicious  but  is  used  to 
install  a  malicious  payload. 
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4.  BEHAVIORAL  ANALYSIS 

As  far  back  as  2011,  Burguera  et  al.  [31]  developed  a  framework  for  Android™  malware  risk  analysis  that 
crowdsourced  app  behavior  by  collecting  traces  of  application  execution  traces.  This  framework  relies  on 
three  main  components:  data  acquisition,  data  manipulation,  and  a  malware  analysis  and  detection  system. 
The  data  acquisition  component  collects  application  data  from  the  user,  including  basic  device  information, 
a  list  of  installed  applications,  and  system  call  log  files.  The  data  manipulation  component  manages  and 
parses  all  the  data  collected;  device  information  is  stored  in  a  central  database  and  system  call  traces  are  used 
to  create  a  feature  vector  for  analysis.  Malware  analysis  and  detection  is  done  by  using  A'-means  clustering 
[32]  over  the  crowd-sourced  system  call  vectors  to  create  a  “normality  model”  and  detect  anomolous 
behavior. 

A  lightweight  app,  Crowdroid,  was  developed  that  is  available  for  installation  from  Google’s  Market  and 
monitors  Linux®  Kernel  system  calls,  sending  them  to  a  centralized  server.  Non-personal,  behavior-related 
data  is  sent  to  the  server  for  analysis.  Once  parsed,  a  baseline  feature  vector  of  system  calls  is  developed. 
System  calls  arc  how  a  program  requests  services  from  the  OS’s  kernel  and  provide  useful  functions  to 
applications  like  network-,  file-,  or  process-related  operations.  The  Linux®  kernel  is  executed  in  the  lowest 
layer  of  Android's  architecture,  so  all  requests  made  from  upper  layers  must  pass  through  the  kernel  using 
the  system  call  interface  before  execution  in  the  hardware.  Specifically,  Crowdroid  uses  a  Linux®  tool, 
Strace,  to  collect  system  calls  to  generate  an  output  file  of  all  Android™  application  events.  This  file 
provides  useful  information  for  each  system  call  executed  by  an  app  (e.g.,  count,  opened  and  accessed  files, 
execution  time  stamps). 

Burguera’s  proposed  framework  and  app,  Crowdroid,  underwent  limited  experimental  testing.  Self-written 
programs  were  used  to  provide  a  normality  model  and  then  the  same  programs  were  modified  with 
self-written  malware.  The  framework  claimed  a  100%  success  rate  in  detecting  self-written  malware. 

A  second  round  of  experimental  testing  featured  two  publicly  available  specimens  of  malware: 

1.  PJApps,  contained  in  the  Steamy  Windows  app  -  Steamy  Windows  gives  the  smart  phone  screen 
the  appearance  of  being  covered  with  steam  and  lets  the  user  wipe  it  off.  The  PJApps  malware 
stalls  a  background  application  and  is  programmed  to  perform  multiple  functions. 

2.  HongTautou  trojan,  contained  within  the  Monkey  Jump  2  app  -  Monkey  Jump  2  is  a  game  that  is 
freely  available  through  Google’s  Play  Store,  but  is  repackaged  with  the  HongTautou  trojan  through 
multiple  third-party  sites.  HongTautou  sends  information  and  browses  the  Internet. 

The  experimental  testing  consisted  of  20  clients  running  the  Crowdroid  app,  and  up  to  60  user  interactions 
with  each  app  for  malware  discovery.  Specifically,  there  were  60  interactions  with  the  self-written  apps  and 
malware,  boasting  a  100%  detection  rate.  There  were  only  eight  interactions  with  the  Steamy  Windows  and 
PJApps  apps  and  20  interactions  with  Monkey  Jump  2  and  the  HongTautou  trojan.  Crowdroid  showed 
100%detection  of  PJApps,  but  only  85%  detection  of  HongTautou.  The  reason  for  the  lower  success  rate  in 
detectign  HongTautou  is  that  it  makes  fewer  sytem  calls,  only  sending  information  and  browsing  the 
Internet. 
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5.  OPTIMIZATION  OF  RISK  ANALYSIS  PROCESSES 


Android™  app  markets,  both  the  official  Google  Play  Store  and  alternative  markets,  simplify  consumer 
software  distribution.  However,  analysis  of  apps  in  these  markets  is  a  daunting  task;  the  Google  Play  Store 
alone  was  approaching  2  million  apps  by  the  beginning  of  2016  [33].  Chakradeo,  et  al.,  propose  Mobile 
Application  Security  Triage  (MAST)  [34]  an  optimization  of  Android™  app  market  malware  evaluation 
processes  utilizing  Multiple  Correspondence  Analysis  (MCA)  [35].  The  triage  process  is  analogous  to  the 
prioritization  of  patients  in  a  hospital  emergency  room.  Triage  is  neither  diagnosis  nor  treatment,  but 
allows  resource  allocation  to  go  where  there  is  the  greatest  obvious  need  while  delaying  or  outright 
dismissing  the  treatment  of  others.  MAST  was  developed  to  provide  a  rank-ordered  list  that  can  determine 
which  apps  in  a  market  arc  suitable  candidates  for  deeper,  more  costly  analysis. 

MCA  is  an  extension  of  correspondence  analysis  that  assists  analysis  of  relationships  between  several 
categorically  dependent  vairables  and  is  used  with  sets  of  observations  described  by  a  set  of  nominal 
variables  [35].  MCA  asks  “individuals”  a  series  of  “questions”  and  maps  each  individual  and  answer  to  a  set 
of  coordinates  in  the  “principle  axes,”  where  information  is  condensed  so  that  the  majority  of  information  is 
reflected  with  just  a  few  axes.  These  principle  axes  can  also  be  indicators  of  “hidden  variables”  that  give 
better  insight  into  collections  of  answers — indeed,  a  skilled  analyst  can  describe  these  hidden  variables  by 
the  principle  axes  and  resulting  point  clusters. 

Given  a  point  cloud  where  there  are  a  number,  N,  of  questions  with  answers  describing  individuals,  a 
cloud  with  a  large  N  will  have  certain  dimensions  that  arc  more  interesting  than  others.  Moreover,  the  most 
interesting  dimensions  of  a  large  cloud  may  not  be  known  a  priori.  MCA  transforms  a  cloud  of 
iV-dimensional  coordinates  into  principal  components,  where  the  first  dimension  is  guaranteed  to  show  more 
information  than  the  second,  and  the  second  more  than  the  third,  etc. 

Two  insights  arc  used  for  transforming  clouds  into  principal  coordinates: 

1.  Scaling  less  common  answers  as  more  distinct  than  more  common  answers,  and 

2.  Variance  of  data. 

Mobile  Application  Security  Triage  (MAST)  uses  MCA  to  rank  Android™  apps  in  order  of  relative 
suspicion.  MAST  works  as  follows: 

1.  App  atributes  that  define  interesting  security  properties  arc  identified. 

2.  Related  attribute  sets  arc  combined  to  create  an  MCA  questionnaire. 

3.  MCA  is  run  over  multiple  polls  to  generate  rough  indications  of  suspicious  behaviors. 

4.  The  rough  indicators  arc  merged  to  create  a  ranking  of  application  suspiciousness. 


Mobile  Application  Security  Triage  uses  easy-to-obtain  attribute  information  from  the  application 
manifest,  without  using  market-specific  metadata.  Specifically,  MAST  considers  four  attributes: 

1.  Permissions  restrict  access  to  security-sensitive  operations.  MAST  considers  only  the  114 
permissions  defined  by  the  Android™  framework,  though  custom  permissions  could  be  considered 
in  addition  to  these. 

2.  Intents  provide  interfaces  between  core  platform  functionality  and  interactions  between  third-party 
apps.  Intent  filters  handle  intents  by  specifying  pre-agreed  on  action  strings  to  customize  user 
experiences.  There  arc  92  action  strings  defined  in  the  Android™  framework  that  MAST  takes  into 
account. 

3.  Native  code  allows  third-party  developers  to  include  native  libraries  in  their  app.  MAST  only 
considers  whether  or  not  an  app  contains  native  libraries. 

4.  The  .zip  files  have  no  restrictions  on  what  they  may  hold,  and  MAST  considers  their  presence  or 
absence  in  an  app’s  main  archive. 
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